Skip to content

PDX-481: fix(prompts): rewrite author-test flow to single-call construction#173

Merged
mrdailey99 merged 2 commits into
developfrom
fix/PDX-481-rewrite-author-test-guidance
May 15, 2026
Merged

PDX-481: fix(prompts): rewrite author-test flow to single-call construction#173
mrdailey99 merged 2 commits into
developfrom
fix/PDX-481-rewrite-author-test-guidance

Conversation

@mrdailey99
Copy link
Copy Markdown
Collaborator

Summary

  • Rewrites the provar.guide.orchestration author-test flow and the PROVAR_TOOL_GUIDE.md "I want to write a new test" section to mandate single-call construction of test cases (full steps[] tree in one provar_testcase_generate call)
  • Marks provar_testcase_step_edit as amend-only — never for constructing new cases from an empty skeleton
  • Splits the orchestration prerequisite graph so generate (construct) and step_edit (amend) appear as distinct entry points
  • Mirrors the construct-vs-amend split in docs/mcp.md and adds pilot-guide Scenario 12 so the regression class is covered in evaluation
  • Adds a regression-guard unit suite (canonical phrasing + multi-scenario snapshot) and a JSON-RPC pdx-481-validate.cjs for protocol-surface verification

Jira

Why

The 1.5.0 regression (PDX-479) traced to two artifacts shipped in PR #153 that both steered LLMs toward a generate-empty + step_edit-per-step authoring pattern. That pattern caused scenarios to be dropped, asserts to be emitted as flat siblings instead of nested under UiWithScreen clauses, and assert API IDs to drift across the case. PDX-480 mechanically + behaviorally confirmed disabling those artifacts restores correct generation. This PR keeps the helpful guidance but rewrites it to the correct pattern.

Test plan

  • yarn compile clean
  • node_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts"1118 passing, 0 failing (12 new tests from this PR)
  • yarn lint clean
  • node scripts/pdx-481-validate.cjs9 / 9 PASS: author-test flow contains "single call", "ALL steps", "amend"; excludes "repeat per step"; prerequisite graph split confirmed; tool-guide resource serves the rewritten copy
  • node scripts/mcp-smoke.cjs — not run; environmental (global sf not linked to local worktree plugin) — verified the same blocker affects the develop tip, not introduced by this PR
  • Reviewer to run an MCP client (Claude Desktop / Cursor) against the local build with a multi-scenario authoring prompt and confirm single-call construction is followed and the resulting XML is structurally clean

Changes

  • src/mcp/prompts/guidePrompts.ts: rewrote 'author-test' flow + split the generate/step_edit lines in the prerequisite graph
  • docs/PROVAR_TOOL_GUIDE.md: rewrote "I want to write a new test" section
  • docs/mcp.md: added construct-pattern callout on provar_testcase_generate; added amend-only callout on provar_testcase_step_edit
  • docs/mcp-pilot-guide.md: added Scenario 12 "Construct a Multi-Scenario Test Case in a Single Call"
  • test/unit/mcp/guidePrompts.test.ts (new): registration + canonical-phrasing assertions for the author-test flow and the prerequisite graph
  • test/unit/mcp/testCaseGenerate.test.ts: added "multi-scenario single-call construction (PDX-481 regression guard)" describe block with 4 tests (12-step payload sequential testItemIds; scenario markers preserved; assert API ID consistency; non-SF target_uri nests correctly)
  • scripts/pdx-481-validate.cjs (new): JSON-RPC validation harness covering 9 protocol-surface assertions

Out of scope

  • testCaseGenerate.ts / testCaseStepTools.ts / testCaseValidate.ts are unchanged — they were byte-identical to the known-good 1.5.0-beta.17 build
  • Tool description hardening (PDX-482) lands separately as defence-in-depth

Customer-facing docs

docs/provar-mcp-public-docs.md and docs/university-of-provar-mcp-course.md are maintained separately by the Provar team and may need a mirror update if they describe the authoring sequence.

🤖 Generated with Claude Code

…uction

RCA: PR #153 (PDX-479 regression) shipped two artifacts — the provar.guide.orchestration prompt's author-test flow and the PROVAR_TOOL_GUIDE.md "I want to write a new test" section — that both steered LLMs toward generate-empty-then-step_edit-per-step authoring. PDX-480 confirmed locally that disabling these restores correct generation. This patch keeps the helpful guidance but rewrites it to recommend a single provar_testcase_generate call carrying the full step tree; step_edit is now explicitly marked amend-only.
Fix: Rewrote the author-test flow in src/mcp/prompts/guidePrompts.ts and the matching section in docs/PROVAR_TOOL_GUIDE.md to mandate single-call construction. Split the orchestration prerequisite graph so provar_testcase_generate and provar_testcase_step_edit appear as distinct entry points with construct-vs-amend annotations. Added construct-vs-amend callouts to docs/mcp.md tool sections. Added pilot-guide Scenario 12 covering the multi-scenario single-call expectation. Added unit tests asserting the canonical phrasing (and absence of "repeat per step") plus a multi-scenario snapshot test that drives 12 steps through provar_testcase_generate in one call and asserts consecutive testItemIds, preserved scenario markers, consistent assert API IDs, and correct UiWithScreen nesting. Added scripts/pdx-481-validate.cjs for direct JSON-RPC verification of the fix at the protocol surface.
Copilot AI review requested due to automatic review settings May 15, 2026 13:50
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Quality Orchestrator

🟢 LOW · 2 / 100 · All changed files have mapped tests.


🧪 Tests to Run · Running 1 of 51 tests

  • unit/mcp/guidePrompts.test.ts
▶ Run command
npx vitest run \
  unit/mcp/guidePrompts.test.ts

⚡ quality-orchestrator  ·  /qo stub <file>  ·  qo analyze-local

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Rewrites the MCP author-test orchestration guidance and related documentation to mandate single-call test case construction via provar_testcase_generate (passing the full steps[] tree at once), and marks provar_testcase_step_edit as amend-only. This addresses the PDX-479 regression where step-by-step authoring caused dropped scenarios, flat asserts, and inconsistent step types. No production logic in testCaseGenerate.ts is modified — the change is to guidance copy plus regression-guard tests and a JSON-RPC validation script.

Changes:

  • Rewrite author-test flow in guidePrompts.ts and PROVAR_TOOL_GUIDE.md; split generate vs step_edit in the prerequisite graph and docs/mcp.md callouts; add Scenario 12 to the pilot guide.
  • Add new test suites: guidePrompts.test.ts (registration + canonical phrasing) and a multi-scenario regression block in testCaseGenerate.test.ts.
  • Add scripts/pdx-481-validate.cjs JSON-RPC harness to verify the rewritten copy and prerequisite-graph split via the MCP protocol surface.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/mcp/prompts/guidePrompts.ts Rewrites the author-test flow steps and splits generate/step_edit entries in the prerequisite graph.
docs/PROVAR_TOOL_GUIDE.md Rewrites "I want to write a new test" section with single-call construction guidance and amend-only step_edit usage.
docs/mcp.md Adds construct-pattern callout to provar_testcase_generate and amend-only callout to provar_testcase_step_edit.
docs/mcp-pilot-guide.md Adds Scenario 12 covering multi-scenario single-call construction for pilot evaluation.
test/unit/mcp/guidePrompts.test.ts New unit suite verifying prompt registration and canonical author-test phrasing.
test/unit/mcp/testCaseGenerate.test.ts Adds a "multi-scenario single-call construction" describe block with 4 regression-guard tests.
scripts/pdx-481-validate.cjs New JSON-RPC validation harness asserting the rewritten copy is reachable and contains canonical phrasing.
Comments suppressed due to low confidence (1)

test/unit/mcp/guidePrompts.test.ts:136

  • The regex relies on default | precedence which makes it provar_testcase_generate[^\n]*construct OR construct[^\n]*provar_testcase_generate. While this works, the unbounded [^\n]* can match across many tokens within the same line, so this could pass even if the words happen to appear on the same line in an unrelated context (e.g. another tool name between them). Consider tightening the regex (e.g. a bounded character window or matching specific punctuation like /() to ensure the test actually verifies that provar_testcase_generate is annotated as the construct entry point.
    assert.ok(
      /provar_testcase_generate[^\n]*construct|construct[^\n]*provar_testcase_generate/i.test(text),
      'prerequisite graph must annotate provar_testcase_generate as the construct entry point'
    );
    assert.ok(
      /provar_testcase_step_edit[^\n]*amend|amend[^\n]*provar_testcase_step_edit/i.test(text),
      'prerequisite graph must annotate provar_testcase_step_edit as the amend entry point'
    );

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/unit/mcp/guidePrompts.test.ts Outdated
Comment on lines +106 to +107
!text.includes('repeat as needed') || text.includes('amend'),
'author-test flow must not say "repeat as needed" without also clarifying step_edit is for amendments only'
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 706cb9f. The OR clause !text.includes("repeat as needed") || text.includes("amend") short-circuited to true on every run because "amend" appears throughout the rewritten author-test copy. Replaced with the unconditional !text.includes("repeat as needed") — the amend-only contract is already covered by the dedicated "author-test flow marks step_edit as amendment-only" test on line 113.

Comment on lines +139 to +152
let pass = 0;
let fail = 0;
for (const r of results) {
console.log(`${r.ok ? '[PASS]' : '[FAIL]'} ${r.label} — ${r.detail}`);
r.ok ? pass++ : fail++;
}
console.log(`\nPDX-481 validation: ${pass} passed, ${fail} failed`);

server.stdin.end();
process.exit(fail > 0 ? 1 : 0);
})().catch((err) => {
console.error('Validation script error:', err);
server.kill();
process.exit(2);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed in 706cb9f. Replaced the ternary statement r.ok ? pass++ : fail++; with an explicit if/else block. On the cleanup concern: server.kill() is already called in the failure path (the script ends with .catch((err) => { console.error(...); server.kill(); process.exit(2); }) — see lines 152-156). That kills the child process before exit on rpc timeout / aborted handshake / any thrown error. Going further (synchronous wait for exit event before process.exit) would add complexity for a script that always exits immediately afterward and is only invoked as a one-shot validator, so leaving cleanup as-is.

RCA: Copilot review flagged two real defects in the PDX-481 regression-guard test and validation harness: (1) the "repeat as needed" assertion in guidePrompts.test.ts had an OR-clause that short-circuited to true because "amend" appears repeatedly elsewhere in the flow, making the assertion a no-op; (2) the prerequisite-graph regex used unbounded [^\n]* so it could match unrelated tokens between the two words on the same line; and one style nit on validate.cjs using a ternary as a statement.
Fix: Made the "repeat as needed" assertion unconditional so it actually protects against the anti-pattern phrasing being reintroduced. Tightened the prerequisite-graph regex to require the exact annotation punctuation (provar_testcase_generate\s*\(construct / provar_testcase_step_edit\s*\(amend) so it cannot pass on unrelated text. Replaced the ternary-as-statement counter in pdx-481-validate.cjs with an if/else block. Also added scripts/pdx-481-trace.cjs (the JSON-RPC trace harness used to capture the patched-vs-unpatched prompt-flow side-by-side that was posted to PDX-479 as concrete regression evidence). All gate checks pass: 1118 mocha tests, lint clean, validation 9/9.
@mrdailey99
Copy link
Copy Markdown
Collaborator Author

Copilot review follow-ups — addressed in 706cb9f

# File Copilot finding Action
1 test/unit/mcp/guidePrompts.test.ts:106 !text.includes('repeat as needed') || text.includes('amend') short-circuits to true ("amend" appears throughout the copy), making the assertion a no-op ✅ Fixed — unconditional !text.includes('repeat as needed'). The amend-only contract is still covered by the dedicated test on line 113
2 scripts/pdx-481-validate.cjs:152 r.ok ? pass++ : fail++; ternary-as-statement ✅ Fixed — explicit if (r.ok) pass++; else fail++; block
2 (cont.) same Server cleanup on timeout/abort ❌ Already handled — the .catch on the IIFE (lines 152-156) calls server.kill() before process.exit(2). Going further to wait for exit event would over-engineer a one-shot validator
3 (suppressed-low-confidence) test/unit/mcp/guidePrompts.test.ts:130, 134 Unbounded [^\n]* regex could match unrelated tokens between the two words on the same line ✅ Fixed — tightened to provar_testcase_generate\s*\(construct and provar_testcase_step_edit\s*\(amend, anchoring on the actual (annotation) punctuation emitted by the prerequisite-graph copy

Gate after fixes

  • yarn compile → clean
  • node_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts"1118 passing, 0 failing
  • yarn lint → clean
  • node scripts/pdx-481-validate.cjs9 / 9 PASS

Bonus inclusion

The follow-up commit also adds scripts/pdx-481-trace.cjs — the JSON-RPC trace harness used to capture the patched-vs-unpatched prompt-flow side-by-side that was posted to PDX-479 as concrete regression evidence. Useful for any reviewer who wants to re-verify locally:

node scripts/pdx-481-trace.cjs

Drives the patched server and dumps the exact bytes an MCP client surfaces to its LLM for: the orchestration prompt's author-test flow (TRACE 1), the tool-guide resource (TRACE 2), both tool descriptions (TRACEs 3 + 4), and a real multi-scenario provar_testcase_generate call (TRACE 5).

@mrdailey99 mrdailey99 merged commit 5276f5e into develop May 15, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants